Create Python API for VideoEncoder #990

Dan-Flores · 2025-10-21T00:16:39Z

This PR creates a simple VideoEncoder class, and updates several tests to utilize the VideoEncoder.to_file pattern.

test_bad_input_parameterized is the VideoEncoder equivalent to the AudioEncoder test that ensures general error checking occurs, while test_bad_input tests method specific errors.
test_contiguity: is the VideoEncoder equivalent to the AudioEncoder test to ensure contiguous and non-contiguous tensors can be encoded, and are encoded equivalently.

test/test_encoders.py

NicolasHug

Thanks @Dan-Flores , looks great, made a quick first pass

NicolasHug · 2025-10-28T14:38:36Z

src/torchcodec/encoders/_video_encoder.py

+        frames (``torch.Tensor``): The frames to encode. This must be a 4D
+            tensor of shape ``(N, C, H, W)`` where N is the number of frames,
+            C is 3 channels (RGB), H is height, and W is width.
+            A 3D tensor of shape ``(C, H, W)`` is also accepted as a single RGB frame.


A 3D tensor of shape ``(C, H, W)`` is also accepted as a single RGB frame.

Q - Do we need to support that? I'm wondering if it makes a lot of sense to just encode a single image as a video. I suspect this was made to mimic the AudioEncoder behavior but that was a different use-case. In the AudioEncoder we want to allow for 1D audio to be supported as it's still a valid waveform. But I don't think we need to treat a single frame as a valid video.

I'm not sure what the use case is for encoding an image as a video, but since FFmpeg allows encoding an image to video, I believe we can retain this functionality for a relatively low cost.

I'm not sure what the use case is for encoding an image as a video

I'm not sure either - let's not do it then :). One principle we have is that we should only implement things that we know are useful. In doubt, it's best not to expose it, it makes our life easier as maintainers. Of course if users start to explicitly request this feature, we may reconsider.

In the near future we will start implementing proper image encoders in torchcodec, and another guiding principle is that there should be one-- and preferably only one --obvious way to do it. (ref). If we were to support 3D tensors (images) into the VideoEncoder then we'd potentially expose two ways to do the same thing.

It doesn't mean we'll prevent users from doing it if they really want to: a user can still just call unsqueeze(0) on their 3D tensor and pass it. It will work. But we don't need to explicitly support 3D tensors on our side.

NicolasHug · 2025-10-28T14:40:18Z

src/torchcodec/encoders/_video_encoder.py

+        frame_rate (int): The frame rate to use when encoding the
+            **input** ``frames``.


My interpretation of this description is that this parameter actually defines the the frame rate of the encoded output, because of the "frame rate to use when encoding" part.

I think it might be less ambiguous as

The frame rate of the input frames. This is not the frame rate of the encoded output.

We use this variable in the Encoder.cpp: avCodecContext_->framerate = {inFrameRate_, 1};.
Perhaps the naming is confusing (should be inputFrameRate rather than inFrameRate), but is it not accurate to say this defines the frame rate of the encoded output?

Good point, you're right, frame_rate and inFrameRate currently define both the frame rate of the input and the frame rate of the encoded output.

But in a future implementation, we might let the user choose to encode the output with a different frame rate. If/when that happens, then this frame_rate parameter will only be about the input frame rate, while the frame_rate parameter that we'll add to the decoding method (to_file(frame_rate=...)) will define the output frame rate. It will be similar to what we do in the AudioEncoder with the sample_rate parameters:

one is for the input, in the constructor:

torchcodec/src/torchcodec/encoders/_audio_encoder.py

Lines 18 to 20 in 28b0346

sample_rate (int): The sample rate of the **input** ``samples``. The

sample rate of the encoded output can be specified using the

encoding methods (``to_file``, etc.).

one is for the output, in the methods:

torchcodec/src/torchcodec/encoders/_audio_encoder.py

Lines 66 to 67 in 28b0346

sample_rate (int, optional): The sample rate of the encoded output.

By default, the sample rate of the input ``samples`` is used.

My original docstring suggestion was definitely wrong (thanks for catching!) but I'd still suggest to clarify that this is really the input frame rate. And, for now, it also happens to be what we use for the output frame rate:

The frame rate of the input frames. Also defines the encoded output frame rate.

Thanks for clarifying this - I'll make a note to add frame_rate to the encoding methods (to_file, etc) next.

NicolasHug · 2025-10-28T14:43:15Z

src/torchcodec/encoders/_video_encoder.py

+        Args:
+            dest (str or ``pathlib.Path``): The path to the output file, e.g.
+                ``video.mp4``. The extension of the file determines the video
+                format and container.


What's the distinction between "format" and "container" here? I would just use one or the other?

The terms are used interchangeably, so I included both here to be understandable to users of both terms. Let me know if that is actually more confusing that just naming one term.

I might have added the distinction after discovering the format mkv is defined as a matroska container. I have not encountered other formats where these are different.

I feel like saying

determines the video format and container.

may suggest to some readers that "format" and "container" refer to different things. Especially because a) "format" is sometimes used to refer to the codec, and b) in the methods we only have one single "format" parameter and we never mention the "container".

How about using the term "container format" everywhere in the docstrings? I think that is the actual correct technical term for that (wikipedia):

Notable examples of container formats include [...] formats used for multimedia playback (such as Matroska, MP4, and AVI).

The parameter can still be format - I think this makes sense for consistency with the AudioEncoder, but in the docstrings of all methods we could clarify that format corresponds to the "container format"?

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 21, 2025

Daniel Flores added 3 commits October 21, 2025 11:43

video encoder python file

1e06ea5

testing

1e7dc34

delete contiguous todo

cf7b75c

NicolasHug reviewed Oct 27, 2025

View reviewed changes

test/test_encoders.py Outdated Show resolved Hide resolved

NicolasHug reviewed Oct 27, 2025

View reviewed changes

test/test_encoders.py Show resolved Hide resolved

use randint suggestion, remove test skips

ee2285e

Dan-Flores force-pushed the vid_encoder_python branch from 3c6fd84 to ee2285e Compare October 27, 2025 17:31

assert contiguity with channels_last

d14deb8

Dan-Flores marked this pull request as ready for review October 28, 2025 14:35

NicolasHug reviewed Oct 28, 2025

View reviewed changes

Dan-Flores added 2 commits October 29, 2025 13:54

incorporate feedback

7ed7f21

update test_bad_input match text

b10d80b

Dan-Flores merged commit d63504c into meta-pytorch:main Oct 29, 2025
59 checks passed

Dan-Flores deleted the vid_encoder_python branch October 29, 2025 20:26

		frame_rate (int): The frame rate to use when encoding the
		input ``frames``.

	sample_rate (int): The sample rate of the input ``samples``. The
	sample rate of the encoded output can be specified using the
	encoding methods (``to_file``, etc.).

	sample_rate (int, optional): The sample rate of the encoded output.
	By default, the sample rate of the input ``samples`` is used.

Uh oh!

Create Python API for VideoEncoder #990

Create Python API for VideoEncoder #990

Conversation

Dan-Flores commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Dan-Flores Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Dan-Flores Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Dan-Flores Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Dan-Flores Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NicolasHug Oct 29, 2025 •

edited

Loading

Dan-Flores Oct 29, 2025 •

edited

Loading

NicolasHug Oct 29, 2025 •

edited

Loading